Flexible Provenance Tracing

نویسندگان

  • Liwei Wang
  • Henning Köhler
  • Ke Deng
  • Xiaofang Zhou
  • Shazia Wasim Sadiq
چکیده

The description of the origins of a piece of data and the transformations by which it arrived in a database is termed the data provenance. The importance of data provenance has already been widely recognized in database community. The two major approaches to representing provenance information use annotations and inversion. While annotation is metadata pre-computed to include the derivation history of a data product, the inversion method finds the source data based on the situation that some derivation process can be inverted. Annotations are flexible to represent diverse provenance metadata but the complete provenance data may outsize data itself. Inversion method is concise by using a single inverse query or function but the provenance needs to be computed on-the-fly. This paper proposes a new provenance representation which is a hybrid of annotation and inversion methods in order to achieve combined advantage. This representation is adaptive to the storage constraint and the response time requirement of provenance inversion on-the-fly. Shazia Sadiq The University of Queensland, Australia

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows

RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and tracing for workflows of MapReduce jobs. RAMP uses a wrapper-based approach, requiring little if any user intervention in most cases, while retaining Hadoop’s parallel execution and fault tolerance. We demonstrate RAMP on a real-world MapReduce workflow generated from a Pig script that performs senti...

متن کامل

Logical Provenance in Data-Oriented Workflows∗ (Long Version)

We consider the problem of defining, generating, and tracing provenance in dataoriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for general transformations, introducing the notions of correctness, precision, and minimality. We then determine when properties such as correctness...

متن کامل

Tracing where and who provenance in Linked Data: A calculus

Linked Data provides some sensible guidelines for publishing and consuming data on the Web. Data published on the Web has no inherent truth, yet its quality can often be assessed based on its provenance. This work introduces a new approach to provenance for Linked Data. The simplest notion of provenance – viz., a named graph indicating where the data is now – is extended with a richer provenanc...

متن کامل

Provenance for Generalized Map and Reduce Workflows

We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. We show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (fi...

متن کامل

Provenance and Case-Based Reasoning

Computational science takes a multidisciplinary approach to scientific investigation, tightly linking scientific research with computational studies and processes such as numerical simulation, data management, and visualization to study complex phenomena such as weather systems. The scientific importance of such processes has led to significant interest in recording the provenance of the data p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJSSOE

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2011